Audio-visual SPeaker localization for car navigation systems

نویسندگان

  • Xianxian Zhang
  • Kazuya Takeda
  • John H. L. Hansen
  • Toshiki Maeno
چکیده

Human-computer interaction for in-vehicle information and navigation systems is a challenging problem because of the diverse and changing acoustic environments. It is proposed that the integration of video and audio information can significantly improve dialog system performance, since the visual modality is not impacted by acoustic noise. In this paper, we propose a robust audio-visual integration system for source tracking and speech enhancement for an in-vehicle speech dialog system. The proposed system integrates both audio and visual information to locate the desired speaker source. Using real data collected in car environments, the proposed system can improve speech accuracy by up to 40.75% compared with audio data alone.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

AV@CAR: A Spanish Multichannel Multimodal Corpus for In-Vehicle Automatic Audio-Visual Speech Recognition

This paper describes the acquisition of the multichannel multimodal database AV@CAR for automatic audio-visual speech recognition in cars. Automatic speech recognition (ASR) plays an important role inside vehicles to keep the driver away from distraction. It is also known that visual information (lip-reading) can improve accuracy in ASR under adverse conditions as those within a car. The corpus...

متن کامل

Measuring Sound Experience with In-Vehicle Speaker Systems

Measuring the perceived quality of speaker systems in the car together with potential users is important to evaluate new speaker developments. Existing sound perception questionnaires suffer from being difficult to answer for human beings because audio perceptions are, for the average user, hard to express in words. In this paper, we present a questionnaire that focuses on user experiences that...

متن کامل

Human-Robot Interaction in Real Environments by Audio-Visual Integration

In this paper, we developed not only a reliable sound localization system including a VAD (Voice Activity Detection) component using three microphones but also a face tracking system using a vision camera. Moreover, we proposed a way to integrate three systems in the human-robot interaction to compensate errors in the localization of a speaker and to reject unnecessary speech or noise signals e...

متن کامل

Vision-Based Vehicle Localization Using a Visual Street Map with Embedded SURF Scale

Accurate vehicle positioning is important not only for in-car navigation systems but is also a requirement for emerging autonomous driving methods. Consumer level GPS are inaccurate in a number of driving environments such as in tunnels or areas where tall buildings cause satellite shadowing. Current vision-based methods typically rely on the integration of multiple sensors or fundamental matri...

متن کامل

AV16.3: An Audio-Visual Corpus for Speaker Localization and Tracking

Assessing the quality of a speaker localization or tracking algorithm on a few short examples is difficult, especially when the groundtruth is absent or not well defined. One step towards systematic performance evaluation of such algorithms is to provide time-continuous speaker location annotation over a series of real recordings, covering various test cases. Areas of interest include audio, vi...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2004